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Abstract 


Vector valued data appearing in concrete applications often possess sparse expan- 
sions with respect to a preassigned frame for each vector component individually. Ad- 
ditionally, different components may also exhibit common sparsity patterns. Recently, 
there were introduced sparsity measures that take into account such joint sparsity pat- 
terns, promoting coupling of non-vanishing components. These measures are typically 
constructed as weighted £; norms of componentwise lq norms of frame coefficients. We 
show how to compute solutions of linear inverse problems with such joint sparsity reg- 
ularization constraints by fast thresholded Landweber algorithms. Next we discuss the 
adaptive choice of suitable weights appearing in the definition of sparsity measures. 
The weights are interpreted as indicators of the sparsity pattern and are iteratively up- 
dated after each new application of the thresholded Landweber algorithm. The resulting 
two-step algorithm is interpreted as a double-minimization scheme for a suitable target 
functional. We show its /2-norm convergence. An implementable version of the algo- 
rithm is also formulated, and its norm convergence is proven. Numerical experiments 
in color image restoration are presented. 


AMS subject classification: 65J22, 65K10, 65T60, 90C25, 52A41, 49M30, 68U10 
Key Words: linear inverse problems, joint sparsity, thresholded Landweber iterations, 
curvelets, subdifferential inclusion, color image reconstruction 


1 Introduction 


Inverse problems. We address the problem of recovering an element u of a Hilbert space 
K from the observed datum g = Tu in the Hilbert space H, where T : K — H is a bounded 
linear operator, possibly non-invertible or with unbounded inverse. A simple approach to 
this problem is to minimize the discrepancy 


T (u) := ||Tu— |||’. 
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If ker(T) = {0} then there exists a unique solution given by u* = (T*T)~!T*g. However, if 
T has unbounded inverse, i.e., ([*7')~! is unbounded then this approach is very unstable. 

Thus, if T is non-invertible or has unbounded inverse (or an inverse with high norm) 
one has to take into account further features of the expected solution. Indeed, a well-known 
way out is to consider the regularized problem 


ux := aremin,exT(u) + allulK||?. 


for which the corresponding solution operator TÌ : g — už, is bounded. Unfortunately, the 
minimal norm constraint is often not appropriate. A recent approach is to substitute this 
particular constraint with a more general one 


uş := argmin ex7 (u) + O(u), 


where ® is a suitable sparsity measure. 

Sparse frame expansions. A sparse representation of an element of a Hilbert space 
is a series expansion with respect to an orthonormal bases or a frame that has only a small 
number of large coefficients. Several types of signals appearing in nature admit sparse frame 
expansions and thus, sparsity is a realistic assumption for a very large class of problems. 
For instance, images are well-represented by sparse expansions with respect to wavelets or 
curvelets, while for audio signals a Gabor frame is a good choice. 

Sparsity has had already a long history of successes. The design of frames for sparse 
representations of digital signals has led to extremely efficient compression methods, such 
as JPEG2000 and MP3 [33]. Successively a new generation of optimal numerical schemes 
has been developed for the computation of sparse solutions of differential and integral 
equations, exploiting adaptive and greedy strategies [38] [15]. The use of sparsity 
in inverse problems for data recovery has been the most recent step of this long career 
of “simplifying and understanding complexity”, with an enormous potential in applications 
[2} (1°7] [18] (20) (22) (27) (35) [39] (10) (73) (T6]. Another field, which caught much attention recently, 
is the observation that it is possible to reconstruct sparse signals from vastly incomplete 
information [7 [6] [23} [82 [36]. This line of research is called sparse recovery or compressed 
sensing. 

From sparsity to joint sparsity. Most of the contributions appearing in the literature 
are addressed to the recovery of sparse scalar functions. Multi-channel signals (i.e., vector 
valued functions) appearing in concrete applications may not only possess sparse frame 
expansions for each channel individually, but additionally the different channels can also 
exhibit common sparsity patterns. Recently, new sparsity measures have been introduced 
that promote such coupling of the non-vanishing components through different channels 
BI [40]. These measures are typically constructed as weighted £; norms of channel 44 
norms with g > 1. We will use this concept for the solution of vector valued inverse problems 
and combine it with another approach further promoting the coupling of sparsity patterns 
along channels. 

Our main results. We show how to compute solutions of linear inverse problems with 
joint sparsity regularization constraints by fast thresholded Landweber algorithms, simi- 
lar to those presented in [17] B5 [39]. We discuss the adaptive choice of suitable weights 
appearing in the definition of the sparsity measures. The weights are interpreted as indi- 
cators of the sparsity pattern and are iteratively up-dated after each new application of 


the thresholded Landweber algorithm. The resulting two-step algorithm is interpreted as a 
double-minimization scheme for a suitable target functional. We prove that our algorithm 
converges to its minimizer. Since the functional is not smooth, this is done by subdifferen- 
tial inclusions [87]. We prove that the thresholded Landweber algorithm, which constitutes 
the inner iteration of the double-minimization algorithm, converges linearly. This feature 
was not ensured by the versions in [789]. The second step of the double-minimization has 
actually a simple explicit solution. Finally, we show that the full exact double-minimization 
scheme converges linearly and we provide an implementable version which is also ensured 
to converge. 

Morphological analysis of signals and sparsity patterns. The use of sparseness 
measures not only allows to reconstruct a signal. At the same time it gives information 
about (joint) sparsity patterns which may encode morphological features of the signal. Well- 
known examples are the microlocal analysis properties of wavelets [80] BI] for singularity 
and regularity detection, and the characterization of edges and curves by curvelets for 
natural images [9]. For instance, the weight sequences appearing in the sparsity measures 
we define, and interpreted as indicators of the sparsity pattern, play a similar role as the 
discontinuity set is playing in the Mumford-Shah functional [34]. In fact, it is well-known 
that wavelet or curvelet coefficients have high absolute values at high scales as soon as we 
are in the neighborhood of discontinuities. Even more illuminating and suggestive is the 
parallel between the sparsity measure and its indicator weights with the Ambrosio-Tortorelli 
[I] approximation of the Mumford-Shah functional. Here, the discontinuity set is indicated 
by an auxiliary function which is 1 where the image is smooth and 0 where edges and 
discontinuities are detected. 

Joint sparsity patterns of vector valued (i.e., multi-channel) signals encode even finer 
properties of the morphology which do not belong only to one channel but are a common 
feature of all the channels. Here the parallel is with generalizations of the Mumford-Shah 
functional as appearing for example in [9| where polyconvex functions of gradients couple 
discontinuity sets through different color channels of images. 

Applications. We expect that our scheme can be applied in several different contexts. 
In this paper we limit ourselves to an application in color image reconstruction, modeling 
a real-world problem in art restoration. Indeed, color images have the advantage to be 
non-trivial multivariate and multi-channel signals, exhibiting a very rich morphology and 
structure. In particular, discontinuities (jump sets) may appear in all the channels at the 
same locations, which will be reflected in their curvelet representation (for instance). For 
these reasons, color images are a good model to test the effectiveness of our scheme promot- 
ing joint sparsity, also because the solution can be easily checked just by a visual analysis. 
Of course, the range of applicability of our approach is not limited to color image restora- 
tion. Neuroimaging (functional Magnetic Resonance Imaging, Magnetoencephalography), 
distributed compressed sensing [3] and several other problems with coupled vector valued 
solutions are fields where we expect that our scheme can be fruitfully used. The numer- 
ical solution of differential and integral operator equations can also be addressed within 
this framework and we refer for example to [38] for implementations by adaptive 
strategies. 

Content of the paper. The paper is organized as follows. In Section 2 we introduce 
the mathematical setting. We formulate our model of joint sparsity for multi-channel signals 


and the corresponding functional to be minimized in order to solve a given linear inverse 
problem. The functional depends on two variables. The first belongs to the space of signals 
to be reconstructed, the second belongs to the space of sparsity indicator weights. Convexity 
properties of the functional are discussed. Section 3 is dedicated to the formulation of 
the double-minimization algorithm and to its weak-convergence. The scheme is based on 
alternating minimizations in the first and in the second variable individually. In Section 4 
we discuss an efficient thresholded Landweber algorithm for the minimization with respect 
to the first variable. Its strong convergence is shown following the analysis in [I7]. The 
minimization with respect to the second variable has an explicit solution and no elaboration 
is needed. We provide an implementable version of the full scheme in Section 5. To prove its 
convergence we develop an error analysis. As a byproduct of the results in this section we 
show that the double—minimization scheme converges strongly. In Section 6 we present an 
application in color image reconstruction. Numerical experiments are shown and discussed. 


Nota on color pictures 


This paper introduces methods to recover colors in digital images. Therefore a gray level 
printout of the manuscript does not allow to appreciate fully the quality of the illustrated 
techniques. The authors recommend the interested reader to access the electronic version 
with color pictures which is available online. 


2 The Functional 


2.1 Notation 


Before starting our discussion let us briefly introduce some of the spaces we will use in the 
following. For some countable index set A we denote by lp = £)(A), 1 < p < ov, the space 
of real sequences u = (u) )aca with norm 


1/p 
lull = lull] = (Zar) , 1<p<o 


AEA 


and ||ulloo := supye, |ua] as usual. If (v)) is a sequence of positive weights then we define 
the weighted spaces lpw = lpw (A) = {u, (uxva) E &)(A)} with norm 


1/p 
llr» = [luleroll = Iuav) = (x si 


AEA 


(with obvious modification for p = oo). If the entries u) are actually vectors in a Banach 
space X with norm ||- ||x then we denote 


fpw(A,X) = {(ua)aeas a € X, (|luallx)aea E &p,0(A)} 


with norm |lu|p,.0(A, X)|| = |I(Iluallx)xcalép.o(A)||. Usually X will be RY endowed with 
the Euclidean norm, or the M-dimensional space £;", i.e., RM endowed with the ¢q-norm. 
By R+ we denote the non-negative real numbers. 


2.2 Inverse Problems with joint sparsity constraints 


Let K and Hj, j = 1,..., N, be (separable) Hilbert spaces and Ay; : K > Hj, j =1,...,M, 


L= 1,..., N, some bounded linear operators. Assume we are given data g; E€ Hj, 
M 
95 = Y Ariu j=1,..., N. 
f=1 
Then our basic task consists in reconstructing the (unknown) elements fe € K, £= 1,..., M. 
In practice, it happens that the corresponding mapping from the vector (fe) to the 
vector (gj) is not invertible or ill-conditioned. Moreover, the data gj, j = 1,...,N, are 


often corrupted by noise. Thus, in order to deal with our reconstruction problem we have 
to regularize it. 

Our basic assumption throughout this paper will be that the ’channels’ f;,2=1,...,M, 
are correlated by means of joint sparsity patterns. Our aim is to model the joint sparsity 
within a regularization term. In the following we develop this idea. 

For the sake of short notation we resume the data vector into 


g = (gj)j=1,. mM EH := Dx; 


where the Hilbert space H is equipped with the usual inner product (X` j gj? j hy) := 
DACIE h;) with gj, hj € Hj. We also combine the operators Ag j into one operator 


M 


A: Bk 4H, Alfil = = (Seat). 


é=1 j=l 
In order to exploit sparsity ideas we assume that we have given a suitable frame {wy : 
A € A} C K indexed by a countable set A. This means that there exist constants C1, C2 > 0 
such that 
Cull fl < Sof dad? < Calif for all f €K. (1) 
AEA 


Orthonormal bases are particular examples of frames. Frames allow for a (stable) series 
expansion of any f € K of the form 


f = Fu:= Doug (2) 


AEA 


where u = (uy)aea E€ £2(A). The linear operator F : €2(A) —> K is called the synthesis map 
in frame theory. It is bounded due to the frame inequality (I). In contrast to orthonormal 
bases, the coefficients u) need not be unique, in general. For more information on frames 
we refer to [I]. 

A main assumption here is that the fe to be reconstructed are sparse with respect to the 
frame {yw}. This means that fe can be well-approximated by a series of the form (@) with 
only a small number of non-vanishing coefficients u). This can be modelled by assuming 


that the sequence u is contained in a (weighted) ¢;(A)-space. Indeed, the minimization of 
the ¢;(A) norm promotes that only few entries are non-zero. Taking for instance a wavelet 
frame and a suitable weight, the 41 constraint implies that the element to be reconstructed 
lies in a certain Besov space By 1, see rq. 

Analogously as in [I7] such considerations lead to the regularized functional 
2 


N M M 
J(u) = |g- Tull? + lull Al = A l- Y Ags Full] +55 So hál, (3) 
j=1 ł=1 Hj 51 AGA 


which has to be minimized with respect to the vector of coefficients u = Gat: The 
liw norm in this functional clearly represents the regularization term. The numbers vy, 
A € A, are some suitable positive weights. Once the minimizer u = (us) is determined we 
obtain a reconstruction of the vectors of interest by means of fe = Fu’ = Y ` us Wy. The 
algorithm in can be taken to perform the minimization with respect to u. 

The functional J(u) in the form stated, however, does not necessarily model any cor- 
relation between the vectors (’channels’) fe, £ = 1,..., M. A way to incorporate such 
correlation is the assumption of joint sparsity, see also [40]. By this we mean that the 
pattern of non-zero coefficients representing fp is (approximately) the same for all the chan- 
nels. In other words, for some finite set of indexes Ag C A and for all 4 = 1,..., N there is 


an expansion 
fe 5 ukpa. 
AEAo 
In particular, the same Ag can be chosen for all f;’s. 
We propose two approaches (that can be combined) to model joint sparsity. The first 
one assumes that the mixed norm 


lulli olt, Ce) = do valua lla 
AEA 


of u = (u$) is small. Hereby, u, denotes the vector (uM , in R™. (Recall also that oM 


denotes R™ endowed with the lynorm). Here, q > 1 and in particular, q = 2 or q = ov, 
represent the interesting cases, since for q = 1 the above norm reduces to the usual weighted 
liw norm. In fact if q is large and some |us| is large then the channel entries us| are also 
allowed to be large for @’ # £, without increasing significantly the norm lux ||. The 
minimization of the above norm promotes that all entries of the ’interchannel’ vector u) 
may become significant, once at least one of the components Jus | is large. 

Introduce the operators Tp; = AgjF : L(A) > He and 


M N M N 
T :0(A,R“) +H, Tu = (Er) = (Dasr) 
f=1 j=l l=1 j=l 


The above reasoning leads to the functional 


K(u) = KP (u) = ||Tu — gH? + lulo (A, 27) 
2 


| (4) 


N || M 
= 5, X Teju — 95 +X oluna 
j=1 || e= Hj AEA 


to be minimized with respect to u. In Section Ø] we will develop an iterative thresholding 
algorithm similar as in [I7] to perform this minimization. 

The second approach to support joint sparsity is to encode the joint sparsity information 
in some sort of indicator function. This can in fact be done by using the weight (v)) as 
a second minimization variable. To this end we add an additional term to the original 
functional (3), punishing small values of v). We obtain the functional 


Jo(u,v) = Jy) (use) = [Pu gH? + X vyllualls + 37 oala — 09)? 
AEA À 


restricted to v) > 0. Here, ()), and (p)), are some suitable positive sequences. 

Now the task is to minimize Jo(u,v) jointly with respect to both u,v. (Again, once 
this minimizer is determined we obtain fy = Fu‘). Analyzing Jo(u,v) we realize that for 
the minimizer (u,v) we will have v, = 0 (or close to 0) if |ual = 5S4 |u| is large so 
that v,||u,||1 gets small. On the other hand, if ||w,||1 is small then the term 6)(p) — vy) 
dominates and forces v) to be close to py. Thus, v) serves indeed as an indicator of large 
values of ||w)||1. It has the effect, that if v) is chosen small due to one large uf then also the 
other coefficients us, , V Æ £ can be chosen large without making the functional considerably 
bigger. 

Unfortunately, in contrast to the previous functionals, J(u,v) as stated above is no 
longer jointly convex in (u,v) in general (although it is convex as functional of u and of v 
alone). Thus, it cannot be ensured that a local minimum of the functional will be a global 
one, a property that is very crucial for an efficient minimization method. 

To overcome this problem we may add an additional suitable quadratic term. Moreover, 
we can, of course, combine the second approach with the first one and use an ¢,-norm instead 
of an £;-norm for the ’interchannel’ vectors u). This leads to the most general form of the 
regularized functional considered in this paper, 


J(u,v) = JÈ (u,v) = Tu- gH + Y valluallg +X walluall3 + So aloa- va). 
AEA ACA AEA 
(5) 


Here, w) is a suitably chosen sequence of positive numbers, and 1 < q < oo. 

We will provide a sufficient condition depending on 0) and p) in the next subsection 
ensuring the strict joint convexity of J(u,v) in (u,v). Although there is an extra term, 
J(u,v) has similar properties as Jo(u,v). In particular, v can still be seen as a sort of 
indicator function. 

Observe that in the minimum we will always have 0 < v) < p). Therefore, we can assume 
the domain of J to be 2(A,R™) x £..,5-1(A)+ where ¢,,,-1(A)4 denotes the (convex) cone 
of all non-negative sequences (va) € C45,)-1(A). 

Our main contribution consists in providing an algorithm for the minimization of J(u, v). 
It consists in alternately minimizing with respect to u and with respect v. The minimization 
with respect to v can be done explicitly. For the minimization with respect to u we propose 
an efficient iterative algorithm. 

We will mainly study the problem in the real-valued case. The complex-valued case can 
be treated with the same methods (in principle) by observing that C™ is isomorphic to 


R2™ | so passing from M complex-valued channels to 2M real-valued channels. We note, 


however, that slight complications may arise from the fact that an @, norm on C™ is not 
isometric to an j-norm on R2™ if q ~ 2. (In particular, the thresholding operator on C™ 
for q = œ will have a different form than the one provided in the next Section for the 
real-valued case). 


2.3. Convexity of the functional J 


At several places in the following it will be convenient to write 


J(u,v) = T(u) +6 (u,v) (6) 
where 
N M 
T(u) = |Tu -gl = X IX Teze" — gill, 
j=l £=1 
PD (u,v) = X vlleralla +X walleealls +X aloa- va)? 
ACA ACA ACA 


2 
= W(erllualla)acalhs + ultze BO + lo- vezo, 


are the discrepancy with respect to the data and the joint sparsity measure, respectively. 
Also it is useful to observe that ® decouples with respect to A, i.e., 


o (u,v) = X dP (uy, 0) (7) 
ACA 
where 
DV (x,y) = yllella + walla + O(n — y)? (8) 
M 1/4 M 
= y (Ziar) +wr Y a? + O(p, — y), xreR¥,y>0 
l=1 l=1 


(with the usual modification for q = 00). 

In the following we give necessary and sufficient conditions for the (strict) convexity 
of the functional 6 for the most interesting cases q = 1,2,00. These imply sufficient 
conditions for the (strict) convexity of J = Ips 


Proposition 2.1. Let q € {1,2,co}. The sparsity measure © is convex if and only if 
WO, > 4 for all A€ A, where k = M for gq = 1, and k = 1 for q € {2,00}. In particular, 
if WO, > $ for all X € A then J is convex. In case of a strict inequality w,9, > Ẹ we can 
replace “convexity” by “strict convexity” in all of these statements. 


Proof. It is easy to see that ® is (strictly) convex if and only if all the ao, A € A, are 
(strictly) convex. 


Let us first consider g = 1. Observe that we can write b (x,y) = i F®(ze,y) 


with 


F®(z,y) = ylz| + wlz? + M710 loa- y)? (9) 
(ylz| + walz|? + Mty’) + (05 — 20y), zER,y >20. 


The function in the second bracket is obviously linear, hence convex. The function in the 
first bracket can be written as the composition G) o L with L(z,y) = (|z|, y) and 


1 
Ga (z, y) = yz +w afe M~'0yy? = zY) Ho (z,y)", Z,YEe R, 


where 


Since L is convex and has range R2, and G) is monotonically increasing in each coordinate 
on RŽ it suffices to show that G)(z,y) is convex if and only if #.w, > M/4, see e.g. Ø 
p. 86]. The convexity of G) is equivalent to H ) being positive semidefinite. The latter 
is clearly equivalent to 0,w, > M/4. Strict convexity is equivalent to a strict inequality 
Owy > M / 4. 

Now let q = 2. Observe that pP = FO o L?) where L?) : RM xR, —> R2, LO (x,y) = 
(llæll2,y) and 


1 
F®(z,y) = yz +w? + Or(p, — y}? = 59) HO) (z,y)" +00 —2pr.y), zyER 
(10) 


with 
(2) _ 2w) 1 
= ( 1 bas) 


By a similar argument as above 6?) is convex if and only if H®) is positive semi-definite. 
The latter is the case if and only if w,9, > 1/4, and strict convexity is equivalent to a strict 
inequality. 

Finally, let q = oo. Observe that 


M 
0) (x,y) = 2x fuia + wy X [Em]? + 8l — vv} 


m=1 


Since po) is the pointwise maximum of M functions, it is sufficient (see [Æ p. 80]) to 
investigate the (strict) convexity of each of the functions 


M 
Yre + wy >. (ae) + Ox(p, — y)” 


m=1 


1 . 
= 5(.2) HY (y,2)" + A(X — 2p), TERM y > 0, 


faala, y) 


with 


2w) O >- O de 

D Bey et 0 poy 

(co) ; . . 
Ay’ = : : _- % : 
0 0 ne 2w) OMe 

die Ooe +--+ Ome 260) 


One can show by induction that 
det (H{~) = 2MM- (40, = 1). 


Thus, H 1) is positive semidefinite if and only if 0,w, > 1/4, and the convexity of g) is 


equivalent to the latter condition. Once again strict convexity is equivalent to the strict 
inequality. O 


We do not pursue the task to obtain conditions for the convexity of ® and J for 
general q Æ 1,2,00, but rather assume that ® and hence J are always convex also in this 
case. 


3 The Minimizing Algorithm and its Convergence 


In this section we propose and analyze an algorithm for the computation of the minimizer 
(u*,v*) of the functional J(u,v) = ID (u,v) defined in (5). The algorithm consists in 
alternating a minimization with respect to u and a minimization with respect to v. More 


formally, for some initial choice v), for example v© = (py)yeq, we define 


ul) := arg min,,ce,(A,RM) J(u, gery), 
n) n), v). 


. 11 
v”) := arg minyeg -1 (A)+ J(ul (11) 


The minimization of J(u,v®®) with respect to u can be done by means of the itera- 
tive thresholding algorithm that we will study in the next section. The minimizer v\” 
of J(u”, v) for fixed u” can be computed explicitly. Indeed, it follows from elementary 
calculus that a 
yw = { pa — ae luaa if [lu allq < 28.0 (12) 
à 0 otherwise . 
We have the following result about the convergence of the above algorithm. 


Theorem 3.1. Let 1 < q < œ and assume that 4) and hence J are strictly convex (see 
also Proposition[Z.1). Moreover, we assume that £ 1/2 (A, RM) is embedded into €2(A,R™), 
i.e., w > y > 0 for all `AE A. Then the sequence (uh ow) nen converges to the unique 
minimizer (u*,v*) € f2(A,R™) x €,,-1(A)4 of J. The convergence of ul™ is weak in 
o(A,R™) and that of v\™ holds componentwise. 

For the most interesting cases q € {1,2,00}, if in addition Owa > o > ¢,/4 for all A € A, 
where od, = M, 62 = 1, go = VM then the convergence of u™ to u* is also strong in 
L(A, RY) and v™® — v* converges to 0 strongly in 2,6(A). 


The rest of the section will be spent with the proof of the weak convergence of the 
algorithm. The strong convergence and the full proof of the Theorem BJ] will be established 
only in Subsection 5.3 later. 
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3.1 Subdifferential calculus 


A main tool in the analysis of non-smooth functionals and their minima is the concept 
of subdifferential. Recall that for a convex functional F on some Banach space V its 
subdifferential OF (x) at a point x € V with F(x) < oo is defined as the set 


OF (x) = {x* €V*,a*(z-—2)4+ F(x) < F(z) for all z € V}, 


where V* denotes the dual space of V. It is obvious from this definition that 0 € OF (x) if 
and only if x is a minimizer of F. In the following we investigate the subdifferential of J. 
In order to have J defined on the whole Banach space £)(A,R™) x La, p-(A) rather than 
just for positive v)’s (which is needed to use subdifferentials) we simply extend J(u,v) by 


J(u,v) = œ if v, < 0 for some A € A 


as usual. This extension preserves convexity and does not change the minimizer. 
Recall that J can be written as J(u,v) = T (u) + 6 (u,v), see (6). Since both 7 and 
() are convex we have, see e.g. [24] Proposition 5.6], 


AJ (u,v) = OT(u) x {0} + 06 (u,v). (13) 
Concerning the subdifferential of 7 we have the following result. 


Lemma 3.2. The subdifferential of T at u € 2(A,R™) consists of one element, 
OT (u) = {2T* (Tu — g)} . 


Proof. Since T is convex and Gateaux-differentiable, by Proposition 5.3 we have OT (u) = 
{T'(u)} , where its Gateaux-derivative is characterized by (T'(u), z) = limp_.o+ Fluthe)—T(u) 
for all z € €(A,R™). It is straightforward to check that the Gateaux derivative of a 
functional of the type u — ||Tu — g||? (with linear T) at u applied on z is given by 


2(Tu —g,Tz) = 2(T*(Tu — g), z}. This proves the claim. O 


Let us now consider the subdifferential of 06 (u,v). Recall its domain (A,R™”) x 
lyo,9-1(A). Since the dual of ¢,,,-1 is a bit inconvenient to handle we restrict the sub- 


differential to the predual 4/1. This will be enough for our purposes. Moreover, recall 


that 6 decouples into a sum of functionals go depending only (u), va), see (J. It is 


straightforward to show the following lemma. 


Lemma 3.3. The subdifferential of ® at a point (u,v) € L(A, R™) x loop (A) with 
P) (u,v) < œ satisfies 


DEV (u,v) = 86M (u, v) N (L(A, R) x 4.(4)) 
= {(€,n) € L(A, R”) x L(A): (&, m) € 0® (ug, va) for all A € A}. 


We are left with investigating the subdifferential of the functional g defined in (8). 
Similarly as J we extend it to RY x R by go (x,y) = 60 for y <0. 
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Lemma 3.4. Let 1 <q <œ. Assume that go is convex (see also Proposition). Then 
for (x,y) E R™ x Ry we have 


06 (x,y) = {(E,n) ERY xR: £E yðll lal) + 2waz, 7 € |l2rl|q0s* ly) + 20(y — pr)}- 
(14) 


where s*(y) := y for y > 0 and st(y) = œ fory < 0. In particular, Os*(y) = {1} for 
y > 0 and Os* (0) = (~, 1]. 


Remark: We recall that the subdifferential of the q-norm on R™ is given as follows. If 
1<q<o then 


BY (1) if x = 0, 
2 Ea , M 
Əl- lz) = { (ca) seen 
Iq é=1 


where B7 (1) denotes the ball of radius 1 in the dual norm, i.e., in ly with 1/q+1/q' =1. 
If q = 1 then 


All -In(z) = {€ ERY: & € 3|- (ze), £ =1,..., M} (15) 
where O| - |(z) = {sign(z)} if z # 0 and O| |(0) = [-1, 1]. 
If q = œ then 
_ | Ba) if x = 0, 
alle = { conv{ (sign (xe)eg : |xe| = ||z||.0} otherwise, te) 


where conv A denotes the convex hull of a set A and eg the &th canonical unit vector in 
RM. 


Proof. Recall that 
6 (x,y) = s+(y)lzlla + walla} + (pa — y). 


Let y > 0 so that g (x,y) is finite. The subdifferential a(d) (ax, y) of ® (x, y) consid- 
ered as a function of x alone (i.e. for fixed y) is clearly given by 


a(d), (x,y) = yðll + |lq(w) + Qwazx (17) 


while keeping y fixed gives 


a(S) (x,y) = Os*(y)|lellq + 28l — py): 


This shows the inclusion ’C’ in Q4). Moreover, for all the points (x,y) € RY x R} where 
g is differentiable we even have equality in (14) since g is convex and, thus, all the 


subdifferentials appearing consist of precisely one point, i.e., the usual gradient. 
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Let 1 < q < œ. Then for x 4 0, y > 0 the differentiability assumption is clearly 


satisfied. For the other cases x = 0 or y = 0 we note that by convexity of go we have (see 


B7 Corollary 10.11]) 


a(d) (x,y) = {£ : In such that (£, n) € aa (x,y)} (18) 


and the corresponding relation for AO), (x,y). Now, if y > 0 then go (x,y) is differ- 
entiable with respect to y and thus, 7 in the right hand side of (I8) is unique, indeed 


n = no := Zo” (x,y). We conclude that for y > 0 


a(@) (x,y) = {(E 10), € € AO), (x, y)} 


In particular this holds for « = 0, even for general 1 < q < œ. The same argument 
applies for the case y = 0 and x Æ 0 (and 1 < q < ov), which shows (Q4) in these cases. 
Now let x = 0 and y = 0. Then the right hand side of ([4) contains precisely one point, 
i.e., (€,7) = (0, —26)p)). Since the subdifferential a (0, 0) contains at least one point by 
convexity, it must coincide with (£, n) by the trivial inclusion °C’. (It is easy to check also 
directly that (0,—20,p,) € 6 (0, 0)). Note that this argument applies also for q = 1, 00. 

It remains to treat the cases q = 1,00 with x Æ 0 and arbitrary y > 0. Let us start with 
q = 1. In the proof of Proposition it was noted that 


with F (1) : R? — R defined in QQ). The subdifferential of F\ can be obtained in the same 
way as above (expressing e.g. formally the modulus as a 2-norm on Rt). For (z,y) € Rx Ry 
this yields 


OF. (z,y) = {(r.m) + 7 € vð (2) + 2wz, n € [215% (y) + 2M 0(y — pa)}- 


By convexity we have 


M 
a0) (x,y) = > { (erze, n) > (2an) € OF, (2ey)} 
l=1 


where ep denotes the ¢-th unit vector in RY. By the explicit form of the subdifferential of 
the @;-norm this gives (4) for q = 1. 
Finally, let q = œ. Similarly as in the proof of Proposition 2.1] we write 


o)(x,y) = max F(x, y) 


with 
F(x, y) = yle + ollel? + Or(0, — y). 
If xe 4 0 then F(x,y) is differentiable with respect to z and 


OFy(x,y) = {(€,) : E = ysign(xe)eg + Qwyx, y € Ost (y)|xe| + 20, (y — py)}, 
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where ep denotes the ¢-th canonical unit vector in RM. This even holds for y = 0 by an 
analogous argument as above, see (18). The subdifferential of pr) (x,y) for z Æ 0 is then 
given by (see e.g. [B7] Exercise 8.31]) 


ae) (x, y) = conv {OF;(z, y) : F(x,y) = max Fim (a, y)}- 


m=l1,..., 


Since x # 0 we have x; Æ 0 if |x| = |z| and the latter is the case iff F(x,y) = 
max, Fm(z£, y). Thus, we obtain 


AO) (x,y) 
=conv |] {(&m): €= ysign(ze)er + wax, € Ost (y)|ael + 20,(y — pr)} 


€:|a¢|=||2||o0 


{€ n) : € € conv{ysign(xe)ec, (ze = llelo} n € |||]098" (y)} + (Qua, 20a(y — pr)} - 


By the characterization of the subdifferential of the co-norm in (16) we obtain the claimed 
equality in (T4) for q = œ and x £0. This finishes the proof. oO 


Combining the previous lemmas we obtain the following result. 


Proposition 3.5. Let 1<q< œ. Assume that ®® is conver and let (u,v) € f2(A,R™) x 
Lop- (A) such that (u,v) <o. Then we have 


DOM (u,v) = {(E,9) € L(A, R”) x f1p(A), £x € vyA| - [lq(ua) + 2wauy, 
m E luallgðst (va) + 20A (V) — Px); AE A} 
c 06 (u,v) (19) 
and 


DJ(u,v) = OF (u,v) N (L(A, R”) x &p(A)) = (2T*T(u—g),0) + DOM (u,v) C J(u, v). 


3.2 Weak convergence of the double-minimization 


Before we actually start proving the weak convergence of the algorithm in (LL) we recall 
the following definition [37]. 


Definition 1. Let V be a topological space and A = (An)nen a sequence of subsets of V. 
The subset A C V is called the limit of the sequence A, and we write A = lim, An, if 


A= {a €V : da, E An,a = liman}. 


The following observation will be useful for us, see e.g. [B7] Proposition 8.7]. 


Lemma 3.6. Assume that T is a convex function on RY and (an) C R™ a convergent 
sequence with limit x such that T (£n), T(x) < co. Then the subdifferentials satisfy 


lim OP (£n) C OL (2). 


In other words, the subdifferential OI of a convex function is an outer semicontinuous set- 
valued function. 
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In the following we agree on the convention that the upper index n at u™® € b(A, RM) 


always denotes the n-th iterate and ul) € RM denotes the (vector-valued) entry at À of 
the n-th iterate. In the following proof we will never refer to the ¢-th component of the M- 
dimensional vector ul”, so hopefully no confusion can arise. Also, we denote by (DJ(u, v)) 
the restriction of DJ(u,v) C f(A, R™) x 1,,(A) to the index à. By the previous section it 
holds 


(DJ(u,v)), = 2L°L(u—g))a,0) + OB) (ua, va) (20) 
Now the proof is developed as follows. First, we recall that (u*,v*) = argmin J(u, v) if 


and only if 0 € 0J(u*,v*). Next, we show that there exist weakly convergent subsequences 
of (u,v) (again denoted by (u\”),v™)) which converge to (ul), v(°)) and that 


0 € lim DJ(u™, v&™) C AT(ul™), v6). (21) 


Due to the strict convexity of J we conclude that (ul), v%)) = (u*,v*). Now, let us detail 
the argument. 
By definition of u(™ and v\™ we have 


Tu , ov) — Ful), yl D) 
= J(u ov) — Fu FY, u) + Tul FY u) — Tul YD, yD) > 0. 


Thus, (J(u), v()), is a nonincreasing sequence, and since J > 0 this implies that 
(J(u, v\)), converges. Moreover, 


Fu y) > Fu, v™) > Y wally” h. 
ACA 


Therefore, (u(”)),, is uniformly bounded in £5 1/2 (A,R™) and thus, there exists a subse- 
quence (u!"*));, that converges to ul) € £5 wyt/2 (A,R™) weakly in both £5 91/2 (A,R™) and 
fo(A,R™), due to our assumption w) > y > 0 for all A € A. For simplicity, let us denote 
again u") = yl”), 

First of all, observe that weak convergence implies componentwise convergence, so that 
us”) = sor) and [T*Tu™], > [T*Tul©)], for all A € A. By the explicit formula (IZ) for 


yw this implies that ví”) converges pointwise to the limit 


(22) 


oO) = mo = L AT Belle alla if [lu alla < 2p, 
À n À 0 otherwise. 


By definition of u™ in we have 0 € Ju (u, v™) (where Ju (u,v) denotes the subdif- 
ferential of J considered as a functional of u only). This means that 
(Ty) — (n—1) 9) (n) (n) 
0 € |2T* (Tu g) < +v Oll lalu ) + 2u,u; for all A € A, 
see also Lemma[B.2Jand (J), in other words 


(n—1 


0= [2r Tu = 9) to) 1) 4 Bon, (23) 
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for a suitable a € Oll- lau®™). Now, let ((™) n) € DJ(u®™,v™). By definition of DJ 
and by (23) we have 


E © [2T*(Tu™ — gh] +u al- [lq (UH) + Ayu’ = vPAI] u) — oP, 


for a suitable ¢ (n) € Oll- lalu). Since yw converges it is possible to choose the sequence 
€( such that limpo e = 0 for all A € A. From (23) it is straightforward to check that 


0 € Ast (uv) [ul l + 20,(ve° — py) forall AE A, (24) 


and similarly 
0 € Ast (uv) [ue la + 20,(0 — py) 


We can choose 7”) = 0 so that lim, a? = 0 for all A € A. Altogether we conclude that 
0 € lim,(DJ(u™, v\™)), for all A € A. By continuity of T and Lemma [Bb] we conclude 


0 € lim arr — g))x,0) + 0B (u,v) 
C (2T*T (ul) — g)y,0) + 8 (ue, V) = DIU), vo), 


for all A € A. It follows that 0 € DJ(ul™), v(®)) c AJ (ul), vu), the latter inclusion by 
Proposition (35). Hence, by strict convexity (u*,v*) = (u,v). With this we have 
shown the weak convergence of the sequence u”) to u*. 

To establish the strong convergence we need to develop a more detailed analysis of the 
minimization of J with respect to u. Next section is devoted to this end, and it will allow 
us to use some further tools for the full proof of Theorem B.I]in Subsection 5.3. 


4 An Iterative Thresholding Algorithm for the Minimization 
with Respect to u 


One step of the minimization algorithm in the previous section consists in minimizing 


Jv) = I 


(u,v) for some fixed v. Moreover, keeping v fixed is also interesting for 
its own — in particular, if one is interested in minimizing the functional K = Ko defined in 
(26). Indeed, for w = 0 and p = v we have ae g(t, v) = KO (u). As we will describe in the 
following this minimization task can be performed by a thresholded Landweber algorithm 
similar to the one analyzed by Daubechies et al. in [I]. 

With v fixed our task is equivalent to minimizing 
K(u) = K9, = ||Tu— gll} + (u) (25) 


with respect to u € l2(A,R™) where 
F(u) = T (u) = J vallualla +X walleall3- (26) 


AEA AEA 
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We assume that T is non-expansive, i.e., |T]|| < 1, which can always be achieved by rescaling. 
Also we suppose that K is strictly convex. This is ensured if e.g. the kernel of T is trivial 
or w) > 0 for all A> 0. 

We define a surrogate functional by 


Ke(u,a) := K(u) —||[Tu-Tallz, + lu— alld = Tu- gll} + Yu) — |[Tu— Talli + llu- all. 


Since ||T|| < 1 also K® is convex, see for a rigorous argument. Now starting with some 
wu € b(A, R”) we define a sequence u™) by 


u™) = arg min K%(u,u") 


ucl? (A, RM) 


The minimizer of K°(u,a) (for fixed a) can be determined explicitly as follows. First, we 
claim that 
argmin K*(u,a) = Uy(a + T*(g —Ta)), 
u 


where the “thresholding” operator Uy is defined as 


U, = i — zl + T(z). 27 
y(u) er z| + V(z) (27) 


Indeed, a direct calculation shows that 
K*(u,a) = ||\Tu— gll} — ||Tu — Tally + |lu — all + X (u) 
(a + T*(g — Ta)) — ull} + X (u) — la + T* (g — Ta)? + llalli — Walle + lall- 


Since the last terms (after ¥(u)) do not depend on u they can be discarded when minimizing 
with respect to u, and the above claim follows. (The same argument works also for general 
’sparseness measures’ Y). Thus, the iterative algorithm reads 


ory — Uy (ul™ +T*(g— Tul™)), (28) 


In the following we give more details about Up and analyze the convergence of this algo- 
rithm. 


4.1 The thresholding operator 


Let us derive more information about Uy for our specific Y = W,., in (6). We have the 
following lemma. 


Lemma 4.1. Let 1<q< oo. It holds 
URU) = Uyo (wa = (1+ ,)-18 (uy), 


where 
sf (a) arg min z z||2 v|| zla; TE R? t (29) 
zERM 


Furthermore, 3 is given by 


SP) =a P e), (30) 
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where Da denotes the orthogonal projection onto the norm ball of radius v/2 with respect to 


the dual norm of ||: ||q, i.e., the ||: ||q-norm with q' denoting the dual index, 1/q+1/q' = 1. 
(The analogous result holds also if the norm ||- ||q is replaced by an arbitrary norm on R™ ). 


(a) decouples with respect to 


Proof. For Yew the minimizing problem defining Uy = 
A € A. Thus, we have 


(USB (z)), = arg min ||x — 2||3 + walla|l3 + vallzlla- 
, zERM 


If z minimizes the latter term then necessarily 0 € 2(1 + w))z — 2x + v)9O|| - ||~(z) where 
O|| - ||¢ denotes the subdifferential of the g-norm. In other words, 


VU 
(1+ wy)z—-# € ~S]|- lale). 


Since || - ||, is 1-homogeneous we have 9]| - ||¢(z) = Ol] - ||g((1 +w,)z). Setting y = (1+wy)z 
gives y — x E€ —O|| - ||q(y), which is the above relation for w, = 0. From this we deduce 
the first claim. 

Let us show the second claim, i.e., the explicit form of the operator s9. We already 
know that if z minimizes the left hand side of (29) then z — z € 0$]|z||q. Let Y(z) = $|lzllq 
and w* be its Fenchel conjugate function defined by w*(y) = sup,((x,y) — f(x)). It is 
well-known [4] p. 93] that 


0 if |lylly < v/2 
oo otherwise 


vy) = X Ba (v/2) (Y) = { 


Here BY (v/2) denotes the norm ball of radius v/2 with respect to the dual norm of ||- ||q- 
It is a standard result, see e.g. [B7] Proposition 11.3], (24) Corollary 5.2], that w € Əy(y) if 
and only if y € Ow*(w) yielding z € Ow*(x — z) in our case, and hence, 


z Eg- z+ 0p (x-— z) = x- z+ OX p¢'(y/2)(€ — 2): 


Now if y € w+ 8X pa! (u/2) (w) then it is straightforward to see that w must be the orthogonal 
projection of y onto BY (v/2), i.e., w = arg min yeg? (y/2) |W — yll2, see also [37] Example 


10.2 and p. 20]. For our situation this means that x — z = Pie), ie, Z = £— Ba (x). 
This shows the second claim. 
Clearly, all arguments work also for a general norm rather than the q-norm. O 


Let us give si explicitly for q = 1,2, co. 
Lemma 4.2. Let x € RY andv> 0. 


(a) For q=1 we have gi) (x)= (sP (xe))}£; where for y € R 


D (y) = 0 if [yl < ai 
sign(y)(ly|— 5) otherwise. 
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(b) For q = 2 it holds 


0 if ||æll2 < 3, 
Sy (@) =f (lzll2—v/2) ; ° 


otherwise. 
lall2 


(c) Let q= co. Order the entries of x by magnitude such that |x;,| > |xi,| >... > |ri,,|- 


1. If ||z|]1 < v/2 then Si) (x) =0. 
2. If |æl|ı >v/2, letn € {1,..., M} be the largest index satisfying 


1 = v 
[zin] 2 oo 3 [i| — z) . (31) 


k=1 


Then 


a sign(z;,) [<~ v l 


k=1 
(SEa) = ti, jontl,...,M. 


Proof. (b) The projection Pj(2) of x onto an 4> ball of radius v/2 is clearly given by 


T if llæll2 < v/2, 


Poja (€) = i v/2 


x otherwise . 


Since by the previous lemma sP (z) =a2—- P* (x) this gives the assertion. 
(a) Although this is well-known we give a simple argument. For q = 1 the functional in 
(29) defining S decouples, i.e., 


M 
(1) = : -pe 
Sy (x) = arg min D (lze — z|? + v|z¢)) . 


Thus, SP (2) = argmin,,er|z¢ — x|? + v|æe| for all £ = 1,...,M. The latter can be 
interpreted as the problem for q = 2 on R! and hence, the assertion follows from (b). 


(c) If ||z||1 < v/2 then Phja(2) = x and by the previous lemma gi) (x) =a2- Pa) = 
(co) 


0. Now assume ||z||ı > v/2. Let z = Sè (x). This is equivalent to 0 being contained in 
the subdifferential of the functional in (Z9) defining S$. This means 


2(z — x) E€ —vO|| - |loo(z). (32) 


We recall that the subdifferential of the maximum norm is given by (16). 

Now assume for the moment that the maximum norm of z is attained in 2,,..., Zin- 
We will later check whether this was really the case. Further, we assume for simplicity 
that all the entries 7;,,...,2;,, are positive. (The other cases can be carried through in the 
same way). Then certainly also the numbers z;,,..., Zi, are positive because choosing them 
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with the opposite sign would certainly increase the functional defining has Then by (T6) 
we obtain 2(z;, — xi) = 0 for the entries z;; not giving the maximum, i.e., zi = ti, j = 
n+1,..., M. 

Moreover, if n = 1 (i.e., the maximum norm of z is attained at only one entry) then 
2(zi, — £) = —v, in other words, z} = x;, — v/2. Thus, the initial hypothesis that the 
maximum norm of z is attained only at z; is true if and only if the second largest entry £i, 
satisfies |x;,| < |z| — v/2. 

So if the latter inequality is not satisfied then the maximum norm of z is at least attained 


at two entries, i.e., n > 2. In this case by (16) the entries z;, = z;, =--- = Zi, = t satisfy 
2t — 2x; = —vaj, j=1,...,n— 1, 
n-1 
2t—2x;, = —v (: = `> a) 
k=1 
for some numbers aj,...,@,—1 € [0,1] satisfying }?;,a; < 1. This is a system of n linear 
equations in t and a1,...,a@n—-1. Writing it in matrix form we get 
1 v/2 0 0  -s 0 t Tii 
1 0 v/2 0o  -s 0 ay : 
a 2. f = & l bee's 
1 —v/2 —v/2 —v/2 +--+. —v/2 On—1 Xi, — v/2 


Denoting the matrix on the left hand side by B, a simple computation verifies that 


1 1 1 1 
2(n—1) _2 2 2 
a i| 2 amy _2 2 
Boe |, =? DOU Ty 
n . . . . 
2 -a ina _2 

Vv Vv 

This gives 
Zii vee Ži 


1 n 
ME r zi; — v/2 
j=l 


and aj = 2 (v/2 + (n — Vari, — Ereti, nhi) Tir 
JE ona} 


1 
Ti; = -i > Zi, — v/2 
ke{1,.. n} \ {i} 


Moreover, a impie calculation gives Da = aj = nt + 2 oe Ti; (i Lz) . Thus, 
it holds 1 — ee , a; > 0 if and only if 
1 n—1 
®in 27 dH u/2 
= 


Therefore, the initial assumption that the maximum norm of u is attained precisely at 


Zii- - -3 Žin can only be true if x;,,...,2;,, are the largest entries of the vector x and 


n 


1 n 
ai = Aa ) Ti; — v/2 
j=1 


ie., (Vee | < nY j= |z; | — v/2). Pasting all the pieces together shows the assertion of 
the lemma. oO 


4.2 Weak convergence 


In the following we will prove that u” converges weakly and strongly to the unique min- 
imizer of K. We first establish the weak convergence. Following the proof of Proposition 
3.11 in [I7] one may extract essentially three conditions on a general sparsity measure V 
such that weak convergence is ensured. Let us collect them in the following Proposition. 


Proposition 4.3. Assume K is given by (23) with a general sparsity measure V and suppose 
K is strictly conver. Let Uy be the associated ‘thresholding operator’ given by EI. Assume 
that the following conditions hold 


(1) Uy is non-expansive, i.e. ||Uy(x) — Uw(y)|l2 < |lz — ylļ2 for all x,y € f(A, R™). 


(2) It holds ||f\lz < H(W(f)) for all f € L(A, R) and some monotonically increasing 
function H on R4}. (This ensures that a sequence fn satisfying Y(fn) < C is bounded 
in €o(A,R™)). 


(3) For all x, h € fo(A,R™) it holds 
UW (Ug (x) +h) — (Ug (x)) + 2(h, Uy (x) — x) > 0. 
Then the sequence u™ defined by (23) converges weakly to the minimizer of K indepen- 
dently of the choice of u©. 


Proof. First we claim that the condition in (1) implies that the surrogate functional K 


satisfies 
K*(u+h,a) — K*(u,a) > |l? (33) 


for u = arg miny K*(u',a) = Uw(a —T*(g — Ta)). Indeed, set x := a — T* (g — Ta), i.e., 
u = U(x). Then an elementary calculation yields 
K*(u+h,a) — K%(u) =||T(u+h) — gll + (u + A) — ||T(ut h) — Tall? + u +h — all? 
Pu — gll — U(u) + ||Tu — Tall3 — llu — all3 
=2(h,u — a — T* (g —Ta)) + U(ut h) — (u) + |All? 
=2(h, Uy (£) — 2) + U(Uy(x) +h) — Y(Uy(2)) + Iil? > All 


The relation in (1) was used in the last inequality. 

Now with (83) and properties (2) and (3) one can easily justify that the proofs of the 
analogues of Theorem 3.2 until Proposition 3.11 in [I7] go through completely in the same 
way, which finally leads to the statement of this proposition. O 
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(a) 


Let us now show that for our specific choice of Y = Pya properties (1) - (3) in the 
previous Proposition hold, and thus, u‘") converges weakly to a minimizer of K. 


Lemma 4.4. Ul, = Uy is NON-eXPANsive. 


Proof. Clearly, the map x +> (1+w))71z is non-expansive. By (80) we have gi = I- Pha. 


a 
Since P4 


0/2 is an orthogonal projection onto a convex set also gi is non-expansive, see e.g. 


[39]. Hence, ui, is non-expansive since on each component x),  € A, it is a composition 


of non-expansive operators. O 


Lemma 4.5. If (và) or (wy) are bounded away from 0 then condition (2) in Proposition 
[7-9 holds. 


Proof. This follows by a standard argument. O 


If we consider the problem of minimizing J(u, v) jointly over u and v then we certainly 
cannot assume that v is bounded away from 0, but in this case we require that w) is bounded 
away from 0. (By Proposition BJ] this is needed anyway to ensure that J(u, v) is jointly 
convex in u and v). In the case where we only minimize J(u, v) with respect to u (i.e., when 
minimizing Y (u) defined in @6)) we may take w) arbitrary (and even w) = 0) but then we 
have to require a lower bound on vy. 

Now consider the third condition in the Proposition. The next lemma shows that it 
suffices to prove it for gi, i.e., for w, = 0. 


Lemma 4.6. Assume that for all z,h € R™ it holds 
v(|| S$? (x) + Alla — IIS? @)lla) + 2(h, SY? (@) — 2) > 0. 
Then condition (3) in Proposition F.J is satisfied. 
Proof. By definition of wl, we need to show that for w,v > 0 and all z, h € RY 
v(a +. w) SP (x) + Allg — IA +)“ tS} (@)llq) 
Hel ta S$ (a) + All — ||. +) TS}? (a) ||) + 2(A, (1 + w) 1S (a) — 2) > 0. 
Setting h’ = (1+w)h we obtain for the left hand side of this inequality 


1 +w) (I) (x) + rla — 1S (la) 


( 
+w) ew (ISP) +W- ISON) +20 + w) 2H’, $0 (x) — 2) 


1 


Hw)! [osL (a) + h'lg — ISO (Ell) + XR, S® (£) — 2)| 


( 
(1 


+ 


+) Pe [ISP (x) + HB — SEEN- A, 8 (w))] > A wlk > 0. 


This completes the proof. O 
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Lemma 4.7. The condition in the previous lemma holds for so, 1<q< oo (and even if 
the lq norm is replaced by a general norm on RY). 


Proof. First note that by definition (29) and duality we have 


ISS? @)lla = (v/2)"+ up (k, x — P42) 
kEBY (v/2) 


A characterization of the a projection tells us that (k — P? te re Pi) < 0 for 
all k € BY (v/2), see e.g. BJ] Lemma 8]. This gives 


[SPO = 0D sup ((k — Piar, z= Piala) + (Pial), SPE) 


sW EN @).- 


Using once more that sO (a )—-r=-P? 


»/2() we further obtain 


v(ISh (a) + Allg ~ 1So(@)(@)|lq) + 2(h, SP (x) — 2) 


> vl] S (2) + Allq — (P22), 94 (a)) — 2(P% (x), h) = vl] S (x) + Allg — 2(P%o(x), S (a) +h) 
> oS (2) + Alla — 2P allg ISP (a) + hla > 0. 

Hereby, we used that ae is a projection onto BY (v/2), so ||P nl < v/2. This finishes 

the proof. O 


To summarize we have the following result about weak convergence. 


Corollary 4.8. Let 1 < q < œ and assume that (v)) or (w\) is bounded from below. Then 
the sequence u”) defined in (28) converges weakly to a minimizer of K, where Y = yi, 
is the sparsity measure defined in (26). (The q-norm in (26) can be replaced by any other 


norm on R™), 


4.3 Strong convergence 
The next result establishes the strong convergence. 


Proposition 4.9. Let 1 <q < œ and assume that (v)) or (wy) are bounded away from 0. 
In case (vy) is not bounded away from 0 assume further that there is a constant c > 0 such 
that vy < c for only finitely many à. Then u™ converges strongly to a minimizer of K. 


Proof. The analogues of Lemmas 3.15 and 3.17 in are proven in completely the same 
way. It remains to justify the analogue of [I7] Lemma 3.18]: If for some a € b(A, RM) 
and some sequence (h™) C é:(A,R™) converging weakly to 0 it holds limo IU (a + 
h) — Ul? (a) — h™|j2 = 0 then ||h°”) ||, — 0 for m — oo. To this end we mainly follow 
the argument in [17]. 

Let c be the constant such that v) < c for A € Aoo for Ago finite. Then let Ag; be a finite 
set such that )7\¢a\Ao, aall < 7 for some o < ¢/2. (Such a set Aoi exists since ||- ||” and 
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||- ||z are equivalent norms on by assumption a € fo(A,R™)). Since Ag = Ago U A01 
is also finite, we have J yeo ni m) \|3 — 0 for m — oo by the weak convergence of h™ to 


miz — 0 for m — oo. 


0. Thus, we are left with proving that ) yeaa IA 

For each m we split Ay := A \ Ao into the subsets Ay m := {A € Ay: yn”? + ally < 
vy /2} ve Aim = = Ay \Aim AEA . a then Ul? (a +A) = O la ), = 0 since 
laa + AY” Ilg, lanle < va/2- Thus, Ih” — UÈ (a + h™), + USB (a Ji- g and by 
assumption, 


D APIR < an -UR (a +h) + UM (a3 +0 as m— oo. 
ACAI 


Now let A € Aim. We first consider the case that w) = 0, i.e., Uv wqx) = sg (x). Since 
llall < o < vy/2 we have SO (a) = 0, and thus, 


I” = SREP 4 one = = AS? — (AL + an) + PE pA + ardlle 


=|| PE ) + ay) — ay|lq > || age te + ay)Ilq = laly > v/2-0 > c/2—o. 


Hereby, we used that Een + a})|lq¢ = va/2 (because ae” yy ay||q > vr/2). Since 
every norm on a finite-dimensional space is equivalent there is a constant C such that 


ne — SO (A + ay) +S (a)l? > O7(c/2 — 0)? > 0. 


However, since by assumption yeh as” — SD (nr +a) + 8 (a) | — 0 asm — oo 
there must exist an mo such that Aj is empty for all m > mo. 
In the case that w, does not vanish we have 


[AE — UL +a), +UD(a)allo = AL? — 1 + wy) 29 (A + a)l 


= (Lt oH + wn — SO (A +a) 
(34) 


We claim that 
| + wayne? — SOA” + ay)llo > JA? — sa” + ay)lle (35) 


so that we can apply the argument for w, = 0 to conclude that Aim is empty for m 
sufficiently large. Let us omit for the moment all indexes A and m for the sake of simpler 
notation. We have 


| + w)h — SP (h + a)l - llh- SI (h + a)l} = 2w(h, h — SP (h + a)) + w* [Al (36) 
and furthermore, 
(h,h — SP (h +a)) = (h, Pt y(h +a) — a) 
= —(h +a — P% (h + a),a — P3, (h + a)) + |la - P%,(h + a)l > 0. 
(37) 
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Hereby, we used that a € BY (o) C BY (v/2) and the fact that (k— Piy (x), s= P%,(2)) <0 
for all k € BY(v/2) and x € R™. Thus, the term in (86) is non-negative and therefore our 


claim (85) holds. o 


Let us shortly comment on the condition that if v) is not bounded from below there is at 
least some c > 0 such that v, > c except for a finite set of indexes A. This condition is mainly 
relevant when considering also a minimization over (v,). Then the term X 6, (p) — va)? in 
the functional J(u, v) ensures that the sequence (pà — va) is contained in l g1/2. If @, and 
pa are bounded from below this implies that v) can be less than 1/2 min) p), say, only for 
finitely many A. 


5 Numerical Implementation and Error Analysis 


The scope of this section is twofold: We want to formulate an implementable version of the 
double-minimization algorithm and show its strong convergence. To this end we develop an 
error analysis. 


5.1 Numerical implementation 


Let us compose the two iterative algorithms described in and (28), respectively, into a 
unique scheme. 


Algorithm 1. JOINTSPARSE 


Input: Data vector (g) a; initial points u® € L(A, R), v® with 0 < yo < P), 
number Nmax of outer iterations, 
number of inner iterations Ln, n = 1,... , Nmax- 


Parameters: q€ [1,00], positive weights (83), (px), (wa) with w) > c > 0, 


such that ® and hence J are convex, see Proposition BJ 
q) 


Output: Approximation (u*,v*) of the minimizer of J$ p 
y09) = uO; 
for n := 0 to Nmax do 

for m := 0 to L, do 


utad = O, | 


(alr) + T* (g = Tul™™))) 
endfor 
y(r+1,0) = u™Ln); 
prt) ea ({ pa = garllu slg, ut) slg < 20rpr ) 
0, otherwise . seh 
endfor 


u* — yltmax-Lnmax) 


v* := ymax), 


Observe that each (inner) iteration of the above algorithm involves an application of 


T*T and of the thresholding operator U2, The latter can be applied fast. So if there is 
also a fast algorithm for the computation of T*T then each iteration can be done fast. 


25 


Our analysis ensures the (weak) convergence of this scheme only if the inner loop com- 
putes exactly the minimizer of J(u,v)) for fixed v\, i.e., if Ln = oo. Of course, this 
cannot be numerically realized, so we need to analyze what happens if the inner loop makes 
a small error in computing this minimizer. In other words, how large do we have to choose 
Nmax and Ln, n = 1,..., Nmax in order to ensure that we have approximately computed the 
minimizer u*,v* within a given error tolerance? 


5.2 Error analysis and strong convergence of JOINTSPARSE 


First of all we want to establish the convergence rate of the inner loop, i.e., the iterative 
thresholding algorithm of the previous Section. 

Proposition 5.1. Assume that w) > y > 0 for all A € A (implying that K(u) = K2 (u) 
is strictly convex) and ||T|| < 1. Set a := (14+ y)71|I — T*T|| < 1. Then the iterative 
thresholding algorithm 


yom) = y® (uam) + T* (g _ Tu) 


yr) jw 


converges linearly 


ur) — aD I < all) — al Jp, (38) 
Proof. Note that 
yh) = p : (uto) +T*(g— ul) . 


By non-expansiveness of s® (see Lemma[Z.4Jand its proof) we obtain 


ju" = ymt!) Ilo 


= eae 


vln) w 


(ate) +T*(g— Tul) ) -u8 


vln) w 


(egra 


1/2 
= (Zo + wy)? |] 92 Uu + T*(g — Tua) — SO (um + T*(g — rao) 
ACA 


< sup(1 + w) E — T°) (ul) — a™))Io << (149) EE FD | a — u)|[2 
AEA 


= aul) — ym) Ila. 
This establishes the claim. O 


Remark: Clearly, the error estimation in (88) holds also if one is only interested in analyzing 
the iterative thresholding algorithm from the last section (i.e. without doing the outer 
iteration). Then it might also be interesting to consider the case that w = 0. According 
to what we have proven in the previous section the algorithm still converges provided the 
weight v is bounded away from zero. However, then the error estimation has a useful 
meaning only if a = || — T*T|| < 1. So this applies if T*T is boundedly invertible. For 
a usual inverse problem, however, we will have a non-invertible T or at least one with 
unbounded inverse resulting in |Z — T*T|| = 1. So in this case we only know that the 
algorithm converges, but an error estimate does not seem to be available. 

For simplicity we restrict the following error analysis to the most interesting cases q € 
{1, 2,00}. We first need the following technical result. 
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Lemma 5.2. For q € {1,2,00} the projection P4 onto the ball BY(v) C R™ is a Lipschitz 
function with respect to v E€ R4. In particular, we have 


|P2(x) — P2(x)\e < Llu-—w| for allx € R™, (39) 
where L =1 for q=2 and L = M'/? for q € {1,00}. 
Proof. Let us start with q = 2. By distinguishing cases it is not difficult to show that 
P22) — Pi (#)|42" || < osa: 


For q = œœ we have P®(x) = (p%°(a))M@, where for y € R 


Py ly) = { : — sign(y)(|y| — v) E 

Since p?° can be interpreted as a projection onto the £2 ball in dimension 1, we obtain that 
Ips (y) — Po) < w — wl, 

and 


M 1/2 
[PZ (2) - PZE = (£ [po (ze) -aop < MPw- ul. 
(= 


The case q = 1 requires a bit more effort. By Lemma[Z.2](c) we have the following. Let 2;, 
denote the reordering of the entries of x by magnitude as in Lemma g3} Let n € {1,..., M} 
be the largest index satisfying 


1 n—1 
[zin] 2 TSi (Siea -») 
k=l 


Then 


sign(z;,) [< l 
(P;(2))i, = oD (Saal 0) ’ j=1,...,n, 
(Pi())j; =0, j=n+1,...,M. 


Observe first that for all x € R™ there exists €o > 0 such that for all 0 < € < eg the same 
n € {1,..., M} is the largest index satisfying 


1 n-1 
bial = Ay (Sobral oh), 
k=1 
For 0 < € < €9, a simple computation yields 


(Portela) — Pa (a)i; i T jeles 


€ 


This means that the map v > P}(z) is right-differentiable, i.e., the limit 


exists in R™. Moreover, it also follows that 
(Ps @))ylle = vn < VM. (40) 


To conclude the proof we use the following standard result. 


Lemma 5.3. Let f : R — R” andy: R — R be two continuous and right differentiable 
functions such that 


IF-@Il < vy), 
for allu E€ R. Then 


IIf(v) — Fw) < po) — ew), for all v > w. 


According to the notation of this latter lemma, let us set f(v) = P}(x) and y(v) = 
M*/?v. Since P}(x) is a continuous function with respect to v (in fact this is true for any 
projection onto convex sets, see [B7]), by (40) and an application of the lemma we conclude 
that ||P} (2) — P}(x)|| < M? Ww — w]. O 


Observe that the strict convexity of P9) (u,v) is equivalent to 0,w, > K/4, see Proposi- 
tion BJ] In the following Proposition we require the slightly stronger condition that 0,w) 
is bounded strictly away from «/4, at least for q = 1,2. 


Proposition 5.4. Let q € {1,2,co}. Assume that Owy > o > ¢q/4 for all X € A, where 
1 = M, $2 = 1, boo = VM, implying that 6 (u,v) and J(u,v) are strictly convex, see 
Proposition Z.I) Moreover, let us assume that w) > y > 0 for all `€ A. Suppose ||T|| < 1 
resulting in |I — T*T|| < 1. Set 


bq Pa 


:= sup ———————— AM < — 
6 = S aot TTN É 40 


< 1. (41) 


Then for each n € N one has the following error estimate 
Jub) — u*lla(A, R”) < Biju — u*|éo(A,R™) |). 
Proof. Let us consider the n-th iteration of the outer loop. We have 


ym) = y® 


vu(n) Ww 


(ul) 4 T*(g E Tu®™))), 
sa MiMi 


=y™ 


By the weak convergence of the double-minimization algorithm, also the minimum solution 
u* satisfies a similar relation, 


u* = UD (u + T*(g — Tu*)). 
Neea 
:=y* 
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Recall that UM, (y), := (1+ wy) 6 (yy). By non-expansiveness of gi (Lemma [Z.4) we 
have 


ur -le < UR wh -0@, Walle IE eh — UPL" )alle 
= (1+) yy? E e 


< (Loa) -TTI uk — ale + IO, a -UEa 


IA 


n Ww 
This implies 


[u — ušla 


IA 


(1 -= (1 +o- TNT UR, J"), -UL alle 
= (1+ wa E- TTI S e GD) — Sig WDllo- 


Recall from Lemma KI] that SP (2)(x) = = g— Phila x) where Pho 


projection of x onto the ¢,-ball of radius v/2. By Lemma B.2]we have that for any z € RY 


denotes the orthogonal 


@ (5) g0) a Fi) _ yt 
Som @) Sox (2l = Poe 22) - Prom lle < gle — “I 
So Si (z) is also Lipschitz in v. Let us recall that 


0 
y™ = Pr — Dy ow as u® Ma < 20) py 
à 0, otherwise . 


and i 
s. f a lle We\lla < 20px 
Uy = : 
0, otherwise . 


By distinguishing cases we can show that 


l (no) _ 


* (n,0) 
WP =al < g Megh- Melel < pelle”? — wile < led 


= wàll2; 


where R = 1 for q € {2,00} and R = M7"? for q = 1. Pasting the pieces together yields 


(noo) x Pq (n,0) x (n0) x 
u —u < ——— M —u uj’ = Ua || 
Summation over A € A completes the proof. o 


Let us combine the previous two results to obtain the error estimation for the finite 
algorithm, i.e., for Ln < oo. 


Theorem 5.5. Make the same assumptions as in Propositions [5.1] and [5-4] Choose Ln 
such that 


On i= (a™(1+8)+6)<8<1 for alln EN. 


(This is possible since a, 3 <1). Then we have linear convergence of our algorithm, i.e., 


ju) — uF lin < dnl — u* lja. 
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Proof. Using Proposition B-J]and Proposition 5.4] we get 


fu) ua < [ful — uta + Ju) — uf 
Satna aAA — HD hp + fll” i 
< očna (ful — ulla + [fut — uP) a) + BuO — uf 
< abt (Ijul29 — ut fp + Glut — u'll) + Allu- — wp 
S (a 3(1+ 8) +A) ue” u] 
This concludes the proof. O 


Remark: The last theorem shows that it is possible to choose the number Ln of inner 
iterations constant with respect to n. 


5.3 Strong convergence of the double-minimization algorithm 


Finally, we can establish the strong convergence of the double-minimization algorithm and 
conclude the full proof of Theorem BI] 


Corollary 5.6. Under the assumptions of Proposition [5.4] if the minimizer of J (u,v) 
for fixed v™ could be computed exactly, i.e., Ln = co for all n € N, then the outer loop 
converges with exponential rate, and we have 


lu — ula < Blu — u", 


where we have denoted here u™® := u'™°°), Moreover, the sequence v”) converges compo- 
nentwise and v™) — v* converges to 0 strongly in £2 6(A). 


Proof. The first part of the statement is a direct application of Proposition 5.4] It remains 
to show that v\) — v* converges to 0 strongly in (A). Using that all norms on R™ are 
equivalent it follows that 


AY lv) -o = Y lele- ee llal? < So uk — wl 1? 


AEA AEA AEA 
< CÙ là -uÇ = Clu" — uea, RIP. 
ACA 
Thus, v(”) — v(°) converges also strongly in ba (A). o 


6 Color image reconstruction 


With this section we illustrate the application of the algorithms for color image recovery. 
The scope is to furnish a qualitative description of the behavior of the scheme. In a sub- 
sequent work we plan to provide a finer quantitative analysis in the context of distributed 
compressed sensing [B]. 

We begin by illustrating an interesting real-world problem occurring in art restoration. 
On 11% March 1944, a group of bombs launched from an Allied airplane hit the famous 
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Italian Eremitani’s Church in Padua, destroying it together with the inestimable frescoes 
by Andrea Mantegna et al. contained in the Ovetari Chapel. Details on “the state of the 
art” can be found in [28} [27]. In 1920 a collection of high quality gray level pictures of these 
frescoes has been made by Alinari. The only color images of the frescoes are dated to 1940, 
but unfortunately their quality (i.e., the intrinsic resolution of the printouts) is much lower, 
see Figure [6] Inspired by the fresco application, we model the problem of the recovery of 
a high resolution color image from a low resolution color datum and a high resolution gray 
datum. We will implement the solution to the model problem as a non-trivial application 
of the algorithms we have presented in this paper. 


6.1 Color images, curvelets, and joint sparsity 


Let us assume that the color images are encoded into YIQ channels. The Y component 
represents the luminance information (gray level), while I and Q give the chrominance 
information. Of course, one may also choose a different encoding system, e.g., RGB or 
CMYK. Clearly, the color image f = (fi, fo, f3) can be represented as a 3-channel signal. 
In order to apply our algorithm, we need to fix a frame for which we can assume color 
images being jointly sparse. 

It is well-known that curvelets [9| are well-suited for sparse approximations of curved sin- 
gularities. A natural image can in fact be modelled as a function which is piecewise smooth 
except on a discontinuity set, the latter being described as the union of rectifiable curves. 
Moreover, there are fast algorithms available for the computation of curvelet coefficients of 
digital images [8]. 

In the following, let us assume that a color image f is encoded into a vector of curvelet 


coefficients ye The image can be reconstructed by the synthesis formula 


fe) p93 (= uv) ; 
=1,2,3 


AEA 


where {w, : A € A} is the collection of curvelets. The index A consists of 3 different 
parameters, À = (j, p, k), where j corresponds to scale, p to a rotation, and k to the spatial 
location of the curvelet Y). We do not enter in further details, especially of the discrete 
and numerical implementation, which one can find in R B]. 

Let us instead observe that significant curvelet coefficients u) = (uj, uf, u3) will appear 
simultaneously at the same A € A for all the channels, as soon as the corresponding curvelet 
overlaps with a (curved) singularity (appearing simultaneously in all the channels), and is 
approximately tangent to it. This justifies the joint sparsity assumption for color images 
with respect to curvelets. 


6.2 The model of the problem 


The datum of our problem is a three-channel signal g = (91, 92, 93) € (Zn, R’) where gi, 
i = 2,3, are the low resolution chrominance channels I and Q, and gı is the high resolution 
gray channel Y. We assume that g was produced by g = Tu where u = (ul, u?, u3) are the 
curvelet coefficients of the three channels of the high resolution color image that we want 
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Figure 1: Left: The low-resolution color image is here presented after Gaussian filtering 
and downsampling. In the numerical experiments the I and Q channels are used. Right: 
The high resolution gray level image encodes several morphological information useful to 
recover the high resolution color image. 


to reconstruct. The operator T = (Ty ;)¢,j;=1,2,3 can be expressed by the matrix 


F o 0 
T=| 0 AF 0 |. (42) 
0 0 AF 


Here, A is the linear operator that transforms the high-resolution image into the low reso- 
lution image. In particular, A can be taken as a convolution operator (with a Gaussian for 
instance) followed by downsampling. Eventually, we may assume a suitable scaling in order 
to make ||T|| < 1, and a different weighting of the gray channel and the I,Q channel in the 
discrepency term. Since A is not invertible, also the operator T is not invertible, and the 
minimization of 7 (u) requires a regularization. Clearly, for this task we use the functional 
K defined in (£) or J defined in (). 


6.3 On the choice of the parameters 


What remains to clarify is the choice of the parameters w), Oà, and p,. The parameter 
w) > y > 0 has been introduced for the sole purpose to make J strictly convex. A large 
value of this parameter actually produces an image u which is significantly blurred and no 
information about edges is recovered. Thus, we rather put w, = y = € > 0 small. Due 
to the convexity requirements (see Subsection 2.3), we select 0), ~ u, The choice of p) 
requires a deeper understanding of the information encoded by the curvelet coefficients. 

Indeed, in [9] it was observed that those curvelets that overlap with a discontinuity 
decay like ||ualla S 273/45 while the others satisfy luall S 2-3/2) (where j denotes the 
scale). Since we want to recover joint discontinuities we may choose py := Pjpk ~ 277% 
with s € [3/4, 3/2]. By this choice and by the locations A for which v) = 0 will indicate 
a potential joint discontinuity. 

Of course, this is just one possible choice of the parameters and further information 
might be extracted from the joint sparsity pattern indicated by v, by the use of different 
parameters. We believe that a deeper study of the characterization of the morphological 
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Figure 2: The 4 error between the original color image and the iterations of the algorithm 
is shown for different values of q = 1,2,00. We have considered Ln = 105 and Nmax = 1, 
the numbers of inner and outer iterations respectively. We have fixed here w) = 0, 8) = 10, 
and py = 20 x 279, 


properties of signals encoded by frames (e.g., curvelets and wavelets) is fundamental for the 
right choice of these parameters. We refer to BO} for deeper insights in this direction, 
concerning fine properties of functions encoded by the distribution of wavelet coefficients. 


6.4 Numerical experiments 


According to the previous subsections, we illustrate here the application of JOINTSPARSE 
for the recovery of a high resolution color image from a low resolution color datum and a 
high resolution gray datum. In Figure[]] we illustrate the data of the problem. In this case 
the resolution of the color image has been reduced by a factor of 4 in each direction by 
using a Gaussian filter and a downsampling. We have conducted several experiments for 
different choices of q € {1,2,00}, with fixed parameters as indicated in Subsection [6.3] We 
have chosen Ln = 105 and Nmax = 1, as well as Ln = 7 and Nmax = 15 (the numbers of inner 
and outer iterations, respectively). In the first case, only the minimization of J(u, v®)) with 
respect to u has been performed, i.e., no iterative adaptation of the joint sparsity pattern 
indicated by v occurred. In order to estimate the different behavior depending of the pa- 
rameters above, we have evaluated at each iteration the f2-error between the reconstructed 
I and Q color channels and the original I and Q color channels. Figures P] and B] indicate 
that the error decreases for increasing values of q. This means that the increased coupling 
due to the g-parameter is significant in order to improve the recovery. Recall that the choice 
q = 1 does not induce any coupling between channels. 

This coupling effect due to q > 1 is even more evident in Figure B] where the adaptation 
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Figure 3: The b error between the original color image and the iterations of the algorithm 
is shown for different values of q = 1,2,00. We have considered Ln = 7 and nmax = 15, the 
numbers of inner and outer iterations respectively. We have fixed here w) = 1/20, 6) = 10, 
and p) = 20 x 27), 


of the weight v occurs. The left and the central pictures in the second row of Figure 
show a reduced color distortion at edges, passing from the case q = 1 (without coupling) to 
the case q = oo respectively, and consequently a better edge resolution. Nevertheless, the 
differences are not so remarkable. This is due to the fact that, although the functional J 
promotes coupling at edges, it does not necessarily enforce a significant edge enhancement. 
Thus, we may modify the functional by adding an additional total variation constraint on 
the I and Q channels: 


Ipy (u,v) := J(u, v) + (|Fulpy + |Fuelry) - 


The effect of this modification is to promote edge enhancing together with their simultaneous 
coupling through different channels. For the minimization of Jpy we use a heuristic scheme 
as in [25], by alternating iterations for the minimization of J and for the minimization of 
(|Fu?|py + |Fu3|py), compare also [19]. The corresponding results are shown in Figure 
[| where the effect of the coupling (for the cases q = 2,00) is further enhanced. The right 
picture in the second row of Figure [B] shows the result of the reconstruction in this latter 
case. The edges are perfectly recovered. 

These numerical experiments confirm that the use of the joint sparsity measure ®( 
associated to the curvelet representation can improve significantly the quality of the recon- 
structed color image. Better results are achieved by choosing q = oo and by the adap- 
tive choice of weights as indicators of the sparsity pattern. Further improvements can be 
achieved by channelwise edge enhancing, e.g., via total variation minimization. An appli- 
cation to the real case of the art frescoes is illustrated in Figure [6] 
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Figure 4: The @) error between the original color image and the iterations of the algorithm 
is shown for different values of q = 1,2,00. We have considered Ln = 7 and nmax = 15, the 
numbers of inner and outer iterations respectively. We have fixed here w) = 1/20, 6) = 10, 
and p) = 20 x 2-/. Further iterations of TV minimization are added in the outer loop to 
enforce edge enhancing. 


7 Final Remarks 


1. If the index set A is infinite then T*T is represented as a biinfinite matrix and thus 
its evaluation might not be exactly numerically implementable. In a subsequent work we 
will consider the case ##A = oo and the treatment of sparse (approximate) evaluations of 
biinfinite matrices in order to realize fast and convergent schemes also in this situation, 
compare also [88] L4 E5]. 

2. To exploit the optimal performance of the scheme, an extensive campaign of numerical 
experiments should be conducted in order to further refine the choice of parameters. It is also 
crucial to investigate the deeper relations among the parameter p), the multifractal analysis 
as, e.g., in [30], and morphological image analysis. In particular, the parallel between the 
functional J and the T-approximation of the Mumford-Shah functional by Ambrosio and 
Tortorelli [I] [5] is suggestive: 


1 
F(u, v) = | u- gas | wf(Vu)de+ | e|Vulde + f G + ra -»)*) dx, 
Q Q Q Q 
~T (u) ~Da rallulle ~D, wy llull? ~Da balan)? 


where f is a suitable polyconvex function, e.g., f(Vu) = (|Vul? + [ux x uy) = (|Vul? + 
jadjo(Vu)|), adj.(A) is the matrix of all 2 x 2 minors of A. The minimization of this 
term enforces that derivatives of different channels are large only in the same directions. 
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Figure 5: First row. Left: Portion of the low resolution color image. Center: Portion of 
the reconstructed color image by Gaussian interpolation and substitution of the Y channel 
with the gray level datum. Evident color artifacts appear at edges. Right: Portion of the 
original color image. Second row. Left: Portion of the reconstructed color image for q = 1, 
Ly = 105, and nmax = 1. Center: Portion of the reconstructed color image for q = ov, 
Ln = 7, and nmax = 15. Right: Portion of the reconstructed color image for q = oo, Ln = 7, 
Nmax = 15, and TV minimization. 


According to the specific choices of p) to indicate the discontinuity set of u, and for w) = € 
and #, = Ł, we may investigate the behavior of the functional J for € — 0 and its 
relation with the Mumford-Shah functional. The term 5°) (px — vy)? essentially counts 
the number of curvelets that, from a certain scale j on such that 277 ~ £, do overlap with 
the discontinuity set and are nearly tangent to the singularity. We conjecture that for € — 0 
and for a rectifiable curved discontinuity, this term estimates the length of the discontinuity. 

3. While we were finishing this paper, we have been informed by G. Teschke of the 
results in [19]. In this manuscript the authors consider linear inverse problems where the 
solution is assumed to fulfill some general 1-homogeneous convex constraint. They develop 
an algorithm that amounts to a projected Landweber iteration and that provides an iterative 
approach to the solution of this inverse problem. In particular for the case w = (w,), = 0, 
some of our results stated in Section 4 can be reformulated in this more general setting and 
therefore derived from [19]. However, for w Æ 0 the sparsity measure vy, as in (26) is not 
1-homogeneous and the elaborations in Section 4 are needed. Moreover, for the relevant 
cases g = 1,2,00, we express explicitly the projection Pa Due to their generality, the 
results in [19] do not provide concrete recipes to compute such projections. 


Figure 6: Left: Low quality color image of the fresco dated to 1940. Center: High quality 
gray image of the fresco dated to 1920. Some details are not visible in the color version. 
Right: The reconstructed image after 6 outer iterations with 7 inner iterations each, for 
q= œ. The final Y channel is substituted with the high resolution gray level datum. The 
discountinuities are enhanced and no artifact colors appear. 


8 Conclusion 


We have investigated joint sparsity measures with respect to frame expansions of vector 
valued functions. These sparsity measures generalize approaches valid for scalar functions 
and take into account common sparsity patterns through different channels. We have an- 
alyzed linear inverse problems with joint sparsity regularization as well as their efficient 
numerical solution by means of a novel algorithm based on thresholded Landweber iter- 
ations. We have provided the convergence analysis for a wide range of parameters. The 
role of the joint sparsity measure is twofold: to tighten the characterization of solutions 
of interest and to extract significant morphological properties which are a common feature 
of all the channels. By numerical applications in color image restoration, we have shown 
that joint sparsity significantly outperforms uncoupled constraints. We have presented the 
results of an application to a relevant real-world problem in art restoration. The wide range 
of applicability of our approach includes several other problems with coupled vector valued 
solutions, e.g., neuroimaging and distributed compressed sensing. 
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